Accurate interval estimation for the risk difference in an incomplete correlated 2 × 2 table: Calf immunity analysis

Interval estimation with accurate coverage for risk difference (RD) in a correlated 2 × 2 table with structural zero is a fundamental and important problem in biostatistics. The score test-based and Bayesian tail-based confidence intervals (CIs) have good coverage performance among the existing methods. However, as approximation approaches, they have coverage probabilities lower than the nominal confidence level for finite and moderate sample sizes. In this paper, we propose three new CIs for RD based on the fiducial, inferential model (IM) and modified IM (MIM) methods. The IM interval is proven to be valid. Moreover, simulation studies show that the CIs of fiducial and MIM methods can guarantee the preset coverage rate even for small sample sizes. More importantly, in terms of coverage probability and expected length, the MIM interval outperforms other intervals. Finally, a real example illustrates the application of the proposed methods.


Introduction
An incomplete correlated 2 × 2 table exists in various biological studies, clinical trials and epidemiological studies [1] used this data structure to evaluate penicillin allergy [2,3] studied tuberculosis skin tests in a two-step procedural design. A typical example is given by [4] involving calf immunity data. Calves were first classified according to whether they had a primary pneumonia infection and then reclassified according to whether they developed a secondary infection within a certain time period after the first infection cleared up [5]. Since the subject cannot have secondary infection when it is not infected at the first stage, the cells corresponding to secondary infection without primary infection will not appear. Table 1 lists the observed data and related probabilities.
Suppose there is a sample of n subjects, let X 11 be the number of subjects infected in both stages, X 12 be the number of subjects who have a primary infection but do not have a secondary infection, and X 22 be the number of subjects who are not infected in both stages, so that X 1 + = X 11 +X 12 and X 1+ +X 22 = n. Here, p 11 , p 12  corresponding cells, with p 1+ = p 11 +p 12 and p 1+ +p 22 = 1. After the first infection clears up, one wants to infer the likelihood of developing a secondary infection due to the effect of the primary infection. One common comparative measure of interest is the risk difference (RD) between the primary infection and the secondary infection, given the primary infection. The RD δ is defined by From the frequentist point of view [6], considered confidence intervals (CIs) for the RD based on Wald's test statistic, the likelihood ratio test and the basic principle of Fieller's theorem. Although these three approaches behave well in many practical problems, the CIs derived from Wald's test statistic, the likelihood ratio test and Fieller's theorem fail to reach the preset confidence level threshold even in moderate sample sizes. As an alternative [2], proposed a score test-based CI and, although the score test statistic is undefined in one scenario, the score CI outperforms all other frequentist intervals in terms of the coverage probability. More literature on risk factors and some potential methods can be found in [6][7][8][9][10][11][12][13][14][15].
If a prior distribution is available for the unknown parameter, the Bayesian posterior distribution provides a meaningful summary of the uncertainty about the parameter. To date, the Bayesian approach has been widely used in the interval estimation of the proportions in correlated tables [16] investigated the performance of Bayesian intervals using different priors, and they found that the Jeffreys prior is comparable to the score test-based CI [17] used a Bayesian estimation of the false-negative rate in a clinical trial of a sentinel node biopsy. Moreover [5], used Dirichlet priors to construct the tail-based interval for the RD. Simulation studies showed that the Bayesian CI at the Jeffreys prior has a shorter expected length than the score test-based CI.
The Bayesian choice of priors is a powerful tool for inferring purposes, but different priors can lead to different posterior distributions. To eliminate the influence of the prior distribution on the inference result, the generalized fiducial inference [18,19] is asymptotically correct and works well in applications [20,21]. However, the determination of the fiducial distribution of parameters remains a problem. Unlike the fiducial method, CIs derived from inferential models (IMs) [22][23][24] can always guarantee nominal coverage for all sample sizes. The main difference between the IM and the fiducial inference is that the IM method always carries out probability calculation in the auxiliary variable space, which can ensure that its inference is strict and correct. Moreover, the IM theory of precise inference needs some improvements. For example [25][26][27][28][29], constructed a randomized IM inference for discrete proportions.
The frequentist methods can be undefined in some cases. Moreover, as two representative CIs, the score and Bayesian intervals cannot guarantee the nominal coverage probability for small to moderate sample sizes. The aim of this paper is to construct three new CIs with accurate coverage for the RD mainly based on the fiducial, IM and randomized IM approaches.

Score test-based CI
From Table 1, the observed vector (X 11 ,X 12 ) comes from the trinomial distribution model PðX 11 ; X 12 jp 11 ; p 12 Þ ¼ n X 11 ; X 12 ! p X 11 11 p X 12 12 ð1 À p 11 À p 12 Þ nÀ X 11 À X 12 : Letp 1þ ¼ ðX 11 þ X 12 Þ=n be the maximum likelihood estimate of p 1+ ; then, the score test statistic [2] for hypothesis H 0 :δ = δ 0 is given by ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi It is well-known that T S (X 11 ,X 12 ,X 22 ,δ 0 ) is an asymptotically standard normal distribution under H 0 . Given the significance level α, the two-sided 1−α approximate confidence limits (δ L and δ U ) for δ can be computed by solving equations The score test is comparable to that of the likelihood ratio test [6], but the likelihood ratio test can be undefined in many scenarios while the score test statistic is undefined when X 11 = X 12 = 0.

New confidence intervals
The small sample properties for approximate score and Bayesian methods may not be calibrated for meaningful probabilistic inference. These two CIs have lower coverage probability in some cases. To date, fiducial and IM-based methods have been used in other inference problems where classic approaches cannot lead to valid inference. Therefore, we consider fiducial and IM solutions for the RD.

Fiducial CI
Fiducial inference is based entirely on the fiducial distribution of the parameters. We first review the fiducial framework [18]. Suppose that X is a random variable indexed by a parameter θ which is generated using the structural equation given by where a is a measurable function and U is an auxiliary variable with a known distribution. Note that the distribution of U is free of θ. Let x and u be realization of X and U, respectively. The information provided by x and u about θ is encapsulated in the set Yðx; uÞ ¼ fyjx ¼ aðy; uÞg: Clearly, x = a(θ,u) is equivalent to Θ(x,u)6 ¼;. The fiducial argument involves replacing u with U' to obtain the random set Θ(x,U'), where U' is an independent copy of U. Since the fiducial distribution of θ is given by Then, we can easily derive a fiducial CI for θ. From Table 1, if the condition is based on the number (x 11 +x 12 ), such that all marginal totals are fixed, the probability of observing x 11 follows a binomial distribution. By derivation, letting T = X 11 +X 12 and X = X 11 , the probability function of (x 11 ,x 12 ) in (1) can be simplified as the product of two independent binomial distributions as follows: where θ = p 11 +p 12 and φ = p 11 /(p 11 +p 12 ). Naturally, we have T~Bin(n,θ) and X|T = t~Bin(t,φ). Moreover, if we define the value of the random variable to be 1 if a trial results in success, and 0 otherwise, then the structural equation can be given by where I A (�) is the indicator function and U i , i = 1,. . .,n, and V j , j = 1,. . .,t, are independent uniform(0,1) random variables.
Due to the discrete nature, event A k 1 Moreover, the fiducial quantity for the RD d ¼ y À φ is given by Since we can easily obtain a Monte Carlo approximation for the distribution of the two endpoints, constructing a fiducial CI for δ is not problematic. Remark 1. Compared to Bayesian inference, the fiducial idea may be more attractive because no prior distributions are needed. Moreover, the fiducial idea has been shown to be asymptotically correct for a single discrete population. However, there is little research on risk factors between two independent binomial distributions. As an extension, our research can fill this gap and is easily applied to other multiparameter inference problems. Furthermore, our simulation studies will show that the fiducial method can guarantee nominal coverage even for small sample sizes.

IM-based CI
Martin and Liu [22] proposed an IM framework for valid probabilistic inference. The IM starts with an association between data, parameters and auxiliary variables. Using the optimal predictive random set to predict the auxiliary variables, the IM produces a postdata probabilistic measure function of uncertainty about the unknown parameter. To summarize, the IM has the following three steps: A-step: From an appropriate mapping a: x = a(θ,U), the IM associates the parameter θ with (x,u) for each possible pair to obtain a collection of sets Θ x (u) of candidate values. P-step: Given data x , suppose θ � is the true value of θ, there exists a u � , such that x = a(θ � , u � ). Moreover, the true value u � is predicted with a valid predictive random set S(u). The validity condition ensures that S(u) will hit u � with large probability.
C-step: The A-step and P-step are combined to obtain a final random set of θ, that is, Θ x (S (u)) = [ u2S(u) Θ x (u). Then, for any assertion A about the parameter of interest θ, the probabilities that bel x (A) = P{Θ x (S(u))�A} and pl x (A) = P{Θ x (S(u))⊄A c } are computed as two measure functions of the available evidence in x supporting A.
The belief function bel x (A) and the plausibility function pl x (A) are known as the minimum and maximum probabilities that support the truth of assertion A. It is more convenient to report the plausibility function, which can easily be used to create frequentist procedures. To test the assertion A = {δ:δ = δ 0 }, we reject H 0 :δ = δ 0 if pl x (A)�α for a significance level α, and this plausibility function yields a two-sided 1−α IM CI {δ: pl x (A)>α}.
Theorem 1 [22]. Suppose X~P X|θ and let A be an assertion of interest, the IM with the plausibility function pl X (A) is valid for assertion A if, for each α2(0,1), sup y2A P Xjy fpl X ðAÞ � ag � a: The resulting IM CI can guarantee the nominal coverage probability when the plausibility function pl X (A) is said to be valid. Moreover, if "�α" can be replaced by "= α", then the IM CI controls the coverage probability exactly at the confidence level 1−α.
Here, we construct a new IM CI for the RD. Let F n,θ (�) denote the distribution function of T~Bin(n,θ). Martin and Liu [22] gave an association linking t, θ, and an auxiliary variable u~P u as follows: Moreover, the association model (7) can be simplified as Similarly, the association for x, given φ, may be written as To derive an initial IM's association for δ = θ−φ, we will take advantage of the well-known relationship between the binomial and beta distribution functions, that is, F n;y ðtÞ ¼ 1 À G tþ1;nÀ t ðyÞ, where G a,b (�) is the beta (a, b) distribution function. Furthermore, we can rewrite joint associations (8) and (9) as follows: where u and v are i.i.d. uniform(0,1) random variables. Hence, the A-step of the IM for δ is and H t,x−1 (�) be the distribution functions of the two endpoints in (10). The association step of IM for the RD is

P-step:
We are interested in two-sided CIs. Following [22], for a singleton assertion A = {δ}, the default predictive random set for the auxiliary variable w, S ¼ fw : jw À 0:5j � jW À 0:5jg, W � Unif ð0; 1Þ: we find that the corresponding plausibility function for a singleton assertion A = {δ} is where the ''+" superscript denotes the positive part. Theorem 2. According to Theorem 1, the plausibility function pl t,x (A) of our IM method is valid for assertion A if, for each α2(0,1), sup d2A P T;Xjd fpl T;X ðAÞ � ag � a: Proof. Given any α2(0,1), we have P T;Xjd fpl T;X ðAÞ � ag Then, sup d2A P T;Xjd fpl T;X ðAÞ � ag � a: Hence, the proof is complete. For any α2(0,1), if pl t,x (A)�α, then the assertion A is wrong. Moreover, this plausibility function yields an IM 1−α CI {δ: The IM CI has the same form as the generalized fiducial CI. However, these two intervals are obtained under different theoretical derivations, fiducial approaches and IM theories, respectively. While the general fiducial distributions in (6) may not be calibrated for meaningful probabilistic inference, IM provides meaningful probabilistic summaries of the information in data concerning the quantity of interest. Moreover, the IM CI derived from a valid plausibility function of IM can guarantee the nominal coverage probability for all sample sizes.

Modified IM CI
In general, the association model (10) of IM is an interval, which will result in a conservative CI. Some adjustments are needed to handle this discreteness. Inspired by the randomized IM idea [25], we consider a modified IM (MIM) approach to modify in Eq (10) to an accurate equation so that we can improve the accuracy of the candidate value of δ.
Proof. Given any α2(0,1), since K t,x (δ)~Unif(0,1) for (t,x)~P (t,x)|δ for all δ, and Moreover, the predictive random set S~P S is valid, that is, P S (w2S)� st Unif(0,1). Hence, sup d2A P ðt;xÞjd fpl 0 t;x ðAÞ � ag ¼ sup d2A P ðt;xÞjd fP S fw 2 Sg � ag � sup d2A P ðt;xÞjd fUnif ð0; 1Þ � ag ¼ a: The MIM inference is valid, by Theorem 1. Hence, the proof is complete. The plausibility function of our MIM method yields a new CI ½d L ; d U � ¼ fd : pl 0 t;x ðAÞ > ag, where δ L and δ U satisfy K t,x (δ L ) = α/2 and K t,x (δ U ) = 1−α/2. Remark 3. Similar to the classical approaches, the Monte Carlo method is also an approximate solution. The main difference between our MIM approach and other approximations is that the accuracy of MIM depends on the repetition times N, but accuracies of other approximations depend on the sample size n. We recommend N = 1,000,000 in practical applications to assure that there is a greater than 95% probability of the absolute error being less than 0.001.

Simulation results
The fiducial and IM CIs have the same form. The score, Bayesian, fiducial and MIM approaches are approximations. We conduct some Monte Carlo simulations to assess the performance of fiducial and MIM intervals, and compare them to the score and Bayesian intervals. Since it is often difficult to obtain explicit expressions of the fiducial and MIM CIs, we suggest approximating these two CIs using the following Monte Carlo algorithms. R codes are available in the S1 Appendix. Table 2 lists four 95% CIs for various combinations including some special cases of zero cells. We see that the Bayesian, fiducial and MIM intervals are well defined for all cases. However, when X 11 = X 12 = 0, the score interval does not exist and the fiducial and MIM CIs have shorter widths than the score and Bayesian CIs. Note that the expected lengths of the score, Bayesian and MIM intervals are almost the same when the sample size increases. __________________________________________________________________________

Algorithm 1: Fiducial CI (δ α/2 , δ 1−α/2 ) --------------------------------------------------------------------
Step 1. For the given sample (x 11 ,x 12 ) from the incomplete correlated 2 × 2 table, t = x 11 +x 12 and x = x 11 are calculated; Step 2. Then, u 1 ,u 2 ,. . .,u t+1 and v 1 ,v 2 ,. . .,v x+1 are generated from a uniform(0,1) distribution, and δ is calculated using Step 2 is repeated N times (1,000,000 for example) to obtain N realizations of δ; Step 4. The α/2 quantile of X t i¼1 U i À X xþ1 j¼1 V j and the 1−α/2 quantile of X tþ1 i¼1 U i À X x j¼1 V j are calculated to approximate δ α/2 and δ 1−α/2 , respectively. Step 2. Then ω 1 , ω 2 , u and v are randomly sampled from the uniform(0, 1) distribution, and equations u ¼ o 1 F n;y ðt À 1Þ þ ð1 À o 1 ÞF n;y ðtÞ and v ¼ o 2 F t;φ ðx À 1Þ þ ð1 À o 2 ÞF t;φ ðxÞ are solved to obtain the unique solution (θ,φ). Then, δ = θ−φ is calculated; Step 3. Step 2 is repeated N times (1,000,000 for example) to obtain N realizations of δ; Step 4. The α/2 and 1−α/2 quantiles of δ are calculated to approximate δ L and δ U , respectively.  For comparison, simulation studies are conducted to examine the performances of the score, Bayesian, fiducial and MIM CIs under different sample sizes. The parameters of the comparison are mainly the coverage probability and expected length. Refer to [2,5] for the parameter settings. We consider p 1+ = 0.3, 0.5 and 0.8; δ = −0.3 (0.1) 0.3; and n = 20, 50, 100 and 300. In each simulation, we first resample the observed value (X 11 ,X 12 ) 10,000 times from the trinomial distribution, calculate the four different CIs accordingly, and compute the corresponding frequencies that cover δ. We regard the coverage frequency as the coverage probability. According to the central limit theorem, the coverage frequency of the nominal 95% where δ U and δ L are the upper and lower limits of the interval, respectively. We report the simulation results in Tables 3 and 4. Cases in which the coverage probability is less than 0.9457 appear in bold underlined. Clearly, the score and Bayesian CIs cannot guarantee the nominal coverage probability for small to moderate samples, except for the fiducial and MIM CIs. Moreover, the score and MIM intervals have similar coverage in most cases, but the expected lengths of score CIs are longer than those of MIM CIs. For example, when n = 20, p 1+ = 0.3 and δ = 0.2, the coverage rates of the score and MIM CIs are 0.9647 and 0.9679, respectively. In this case, the expected lengths of the score CI (0.93) are significantly longer than those of the MIM CI (0.70). Furthermore, it seems that the score CI has a more accurate coverage probability than the MIM CI in some cases such as n = 50, p 1+ = 0.3 and δ = 0.1, but our MIM CI also has a shorter length than the score CI. Note that our MIM method uses a shorter interval to obtain higher coverage, which shows that the MIM CI outperforms the score CI. From Table 4, when the sample size increases, the coverage probabilities of the score, Bayesian and modified IM CIs all fall in the interval (0.9457, 0.9543). Although the fiducial CI has a slightly larger coverage rate than other intervals, the expected lengths of the four CIs are almost the same. In this sense, the fiducial method is not inferior to existing methods. Moreover, according to the central limit theorem, the large sample properties indicate that the theoretical results of the score, Bayesian, fiducial and modified IM methods will tend to be consistent. In summary, the fiducial and MIM methods can improve the poor coverage probabilities of the score and Bayesian approaches for small to moderate sample sizes. For larger samples, the CIs of score, Bayesian and MIM are the same, and the fiducial interval is not inferior to the other three intervals. Compared with the fiducial interval, the MIM interval exhibits more accurate coverage with a shorter expected length for all sample sizes. Hence, in terms of coverage probability and expected length, the MIM CI is the best of all.
To obtain a better understanding of the different performances of various intervals for small sample sizes. Let n = 30. Figs 1-3 give plots of the coverage probabilities and expected

PLOS ONE
lengths of four CIs versus p 12 for fixed p 11 = 0.1, 0.2 and 0.5, respectively. Here, we draw a dashed line (y = 0.9457) as the maximum lower bound of the coverage probability. Clearly, the score and Bayesian intervals have coverage probabilities lower than 0.9457 in some cases, especially when p 12 is very close to zero. In contrast, the fiducial approach can always guarantee nominal coverage for all cases, and the MIM CI improves the conservative coverage of the fiducial CI. For expected length, the fiducial and MIM methods have shorter expected lengths than score and Bayesian approaches when p 12 is close to zero. Moreover, although the score and Bayesian CIs have shorter expected lengths in most cases, differences between various expected lengths are small when p 12 becomes larger. Note that the score and Bayesian CIs cannot guarantee the preset coverage probability; hence, the fiducial and MIM intervals are superior to the score and Bayesian intervals. Following [30], when the observed data (t,x) cannot be separated from the auxiliary variables (u',v'), the validity condition K t,x (δ)~Unif(0,1) may not be automatic. However, different from other approaches, we can check the good coverage performance of the MIM method for different parameter settings, p 11 = 0.1 (0.2) 0.5; p 12 = 0.1 (0.1) 0.5; n = 20 and 50. In each simulation, we take 10,000 samples.  where W~Unif(0,1). In particular, from Fig 7-9, the distribution function of K t,x (δ) is very close to that of Unif(0, 1) for a moderate sample size, hence the MIM method has accurate coverage.

A real data analysis
We illustrate the application of the proposed methods with a real example. A sample of 156 dairy calves born in Florida, were classified according to whether they had pneumonia within 60 days after birth [4]. Calves that got a pneumonia infection were also classified according to whether they got a secondary infection within two weeks after the first infection cleared up. Table 5 shows the data. Calves that did not get a primary infection could not get a secondary infection, so no observations can fall in the cell for "no" primary infection and "yes" secondary infection. The goal of this study was to test whether the probability of primary infection was the same as the conditional probability of secondary infection, given that the calf got the primary infection.
Here we used the RD to study the effect of primary infection on the likelihood of developing secondary infection. Under the 95% confidence level, the score, Bayesian, fiducial and MIM CIs for δ are (0.15, 0.39), (0.15, 0.39), (0.14, 0.40) and (0.15, 0.39), respectively. Clearly, the lower bounds of the four intervals are all larger than 0. It is suggested that the primary infection of pneumonia should stimulate a natural immunity to reduce the likelihood of secondary infection. Hence, the fiducial and MIM methods work well with calf immunity data. More importantly, the MIM method also provides probabilistic summaries of the information in data concerning the quantity of interest. To be more informative, we plot the plausibility t;x ðdÞ, as a function of δ in Fig 10. By locating α = 0.05 on the vertical axis, we can easily find that the lower bound (0.15) and the upper bound (0.39) are in the MIM CI. Furthermore, the plausibility function shows that each point δ in the MIM interval is individually sufficiently plausible. Clearly, no frequentist or Bayesian interval can assign such a meaning to the individual elements it contains. In this sense, the proposed MIM interval is recommended for practical use.

Discussion
The RD is a comparative measure between the probability of the primary infection and the conditional probability of the secondary infection, given the primary infection. The confidence intervals of the score and Bayesian methods have poor coverage performance for small to moderate sample sizes. In this paper, we propose three valid CIs based on the fiducial, IM and MIM approaches for the RD. The fiducial and IM-based CIs have more accurate coverage  performance than the score and Bayesian CIs. Compared with the fiducial approach, IM-based approaches can provide meaningful probabilistic summaries of the information in data concerning the quantity of interest. Moreover, the MIM method uses a randomized IM idea to modify the two inequation associations of IM to an accurate equation model. A real data example shows that the proposed methods work well for the calf immunity data. Different from other approximate approaches, our MIM solution has the advantage that its approximation precision, which only depends on the simulation times rather than the sample size, may tend to be as high as possible whether the sample size is large or small. Moreover, the IM's output is posterior-probabilistic in nature and, therefore, has a meaningful interpretation within and not just across experiments. Moreover, our fiducial and IM-based methods are general ideas that can be applied to infer other risk factors, such as the risk ratio. Finally, since the effect of the observed data cannot be separated from the auxiliary variables, there could be interest in the simultaneous prediction of several auxiliary variables. The best choice of predictive random set needs further study. Supporting information S1 Appendix. (DOCX)